Skip to content

feat: multi-model adversarial harnesses - structural hardening - security sandbox - battle-tested results#3

Open
lliWcWill wants to merge 8 commits intocoleam00:mainfrom
lliWcWill:adversarial-dev-hardening
Open

feat: multi-model adversarial harnesses - structural hardening - security sandbox - battle-tested results#3
lliWcWill wants to merge 8 commits intocoleam00:mainfrom
lliWcWill:adversarial-dev-hardening

Conversation

@lliWcWill
Copy link
Copy Markdown

@lliWcWill lliWcWill commented Apr 2, 2026

Summary

  • Added two new harnesses: mixed-harness (Claude Opus generator, GPT-5.4 evaluator) and gemini-harness (Claude Opus generator, Gemini 3.1 Pro evaluator with sandboxed tool calling)
  • Hardened contract negotiation with 3-round iterative loop, fail-closed parsing, and mid-sprint renegotiation triggers
  • Built ConversationLogger that saves every agent exchange as Obsidian markdown and JSONL
  • Added 29 tests covering parseContract, renegotiation logic, parseEvalResult, and conversation logger
  • Secured the Gemini evaluator sandbox: command allowlisting, realpath symlink protection, git read-only, find -exec blocking, absolute path rejection
  • Battle-tested all four harnesses on the same real-world bug fix prompt

Results - first multi-model harness runs

Harness Generator Evaluator Result Time
claude-harness Opus 4.6 Opus 4.6 5/5 PASSED 53.4 min
codex-harness GPT-5.4 GPT-5.4 0/1 FAILED 59.6 min
mixed-harness Opus 4.6 GPT-5.4 11/13 on Sprint 4 60+ min
gemini-harness Opus 4.6 Gemini 3.1 Pro 5/5 PASSED 50.7 min

Cross-model evaluation caught bugs that self-evaluation missed:

  • GPT-5.4 flagged incomplete OAuth token refresh that Claude gave itself a pass on
  • GPT-5.4 caught REPL token display showing zero after completed responses
  • Gemini ran the full test suite via tool calling before scoring, Sprint 4 failed first attempt

Structural fixes to harness logic

  • Contract negotiation is now iterative: generator proposes, evaluator reviews, up to 3 rounds of back-and-forth before finalizing
  • parseContract throws on malformed JSON instead of falling back to defaults - fail-closed, not fail-open
  • Retry loop triggers renegotiation when avgScore drops below 4 or all criteria are failing
  • Division-by-zero guard on empty feedback arrays
  • APPROVED check is now case-insensitive startsWith instead of fragile exact-match

Gemini evaluator sandbox - 5 layers of defense

  • Command allowlist: no code execution binaries - node, bun, npm removed entirely
  • Git restricted to read-only subcommands: log, status, diff, show, ls-files, rev-parse
  • Dangerous flag blocking: find -exec, -execdir, -delete rejected
  • Path confinement: absolute paths outside workspace rejected for all commands
  • Symlink resolution: realpath prevents symlink-based escape from workspace

New files

  • mixed-harness/ - 5 files: Claude generates code, Codex GPT-5.4 evaluates
  • gemini-harness/ - 5 files: Claude generates code, Gemini 3.1 Pro evaluates with tool calling
  • shared/conversation-logger.ts - Obsidian markdown and JSONL transcript export
  • tests/mixed-harness.test.ts - 22 tests
  • tests/conversation-logger.test.ts - 7 tests
  • RESULTS.md - full battle report with scores, timings, analysis
  • examples/gemini-run-excerpt.md - sample ConversationLogger output

Test plan

  • bun test - 29 tests passing
  • Claude harness: 5/5 sprints passed on real-world prompt
  • Gemini harness: 5/5 sprints passed on same prompt
  • Codex harness: confirmed failure mode documented
  • Mixed harness: 11/13 criteria passing on Sprint 4, adversarial evaluation working as designed
  • Verify mixed harness completes full run
  • Run harnesses on a second prompt to confirm generalization

1. Iterative negotiation: negotiateContract now runs up to 3 rounds
   of generator→evaluator back-and-forth instead of single-pass.
   Generator counter-proposes based on evaluator feedback until APPROVED.

2. Fail closed on malformed contracts: parseContract throws instead of
   silently falling back to generic 3-criterion default. Caller retries
   negotiation up to 2 times before propagating the error.

3. Renegotiate on bad criteria: retry loop now detects when all criteria
   are failing (avg score < 4 or all below threshold) and triggers
   contract renegotiation mid-sprint instead of burning retries against
   impossible criteria.

Applied to both claude-harness and codex-harness.
CodeRabbit review caught two issues:
1. Empty feedback array → division by zero → NaN avgScore
2. allFailing=true branch logged but never renegotiated (try block
   was inside else-if only)

Fix: add feedback.length > 0 guard, restructure to outer condition
gates renegotiation with inner if/else for accurate log messages.
…ation logger

New mixed-harness/ — cross-model adversarial dev inspired by GAN architecture:
- Generator (Claude Opus 4.6) builds code against sprint contracts
- Evaluator (Codex GPT-5.4) rips apart the work in fresh context
- Zero sycophancy: evaluator has no emotional investment in the code

Includes all 3 hardening fixes from parent harnesses:
- Iterative contract negotiation (3 rounds)
- Fail-closed contract parsing (throws on garbage)
- Mid-sprint renegotiation when all criteria fail

ConversationLogger (shared/conversation-logger.ts):
- Captures every agent prompt, response, tool call, score, and error
- Saves as Obsidian-friendly markdown (.md) + machine-readable JSONL
- Default output: agent-brain-vault/Projects/brane-code/debates/
- Collapsible tool calls, score badges, duration tracking

Tests (29 passing):
- parseContract: 9 tests (fail-closed, code blocks, garbage rejection)
- Renegotiation trigger: 7 tests (thresholds, division-by-zero guard)
- parseEvalResult: 3 tests (threshold recalculation, extraction)
- Negotiation rounds: 3 tests (early approval, max rounds)
- ConversationLogger: 7 tests (entries, markdown, JSONL, disk save)
New gemini-harness/ — cross-company adversarial dev:
- Generator: Claude Opus 4.6 (Anthropic) builds code via Agent SDK
- Evaluator: Gemini 3.1 Pro Preview (Google) rips it apart via @google/genai
- Gemini evaluator has tool calling: readFile, runCommand, listFiles
- Multi-turn chat loop handles tool calls until Gemini is done evaluating
- 1M context on BOTH sides — true heavyweight matchup

Same 3 hardening fixes as other harnesses:
- Iterative contract negotiation (3 rounds)
- Fail-closed contract parsing
- Mid-sprint renegotiation on bad criteria

Includes ConversationLogger integration for full transcript logging.

SDK: @google/genai@1.48.0
Model: gemini-3.1-pro-preview (1M input, 65K output)
- Gemini evaluator: sandbox tool handlers with path confinement,
  command allowlisting (execFileSync instead of execSync), and
  fs.readdir instead of shell-interpolated find
- Fix fragile "APPROVED" exact-match in all 4 harness negotiation
  loops — now case-insensitive startsWith
- Remove personal vault path from logDir defaults (use ./logs)
- Remove brane-streaming-fix.md (task doc, not project code)
- Remove unused imports in mixed/gemini harnesses
- Fix copy-paste comment ("Codex" -> "Gemini" in gemini harness)
- Gemini sandbox: use realpath() instead of resolve() to prevent
  symlink-based path traversal escape
- Gemini sandbox: reject runCommand args with absolute paths outside
  workspace (prevents `cat /etc/passwd`, `grep -r secret /etc`)
- All evaluators: guard against empty feedback array where
  [].every() returns true — prevents silent false pass
- ConversationLogger: empty JSONL returns "" not bare newline
Security (Gemini evaluator sandbox):
- Remove node/npm/npx/bun/bunx from runCommand allowlist — prevents
  arbitrary code execution via `node -e "..."`
- Restrict git to read-only subcommands (log, status, diff, show,
  ls-files, rev-parse) — prevents data exfiltration via git push
- Block find -exec/-execdir/-delete flags — prevents subprocess spawn
- Keep path containment (realpath + absolute path rejection) from
  previous commit

Polish:
- Fix wrong evaluator name in gemini generator prompt ("Codex" -> generic)
- Remove unused readContract import from claude-harness
- Fix README: wrong default model (sonnet -> opus), add mixed/gemini
  harness sections to Quick Start and Project Structure
RESULTS.md — full scoreboard, key findings, and analysis from 4 harness
runs (Claude 5/5, Codex 0/1, Mixed 11/13 on S4, Gemini 5/5).

examples/gemini-run-excerpt.md — first 150 lines of actual Gemini
evaluator conversation log showing the ConversationLogger output format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant